智能论文笔记

Bandit problems with fidelity rewards

Gábor Lugosi , Ciara Pike-Burke , Pierre-André Savalle

分类： (统计)机器学习 | 机器学习

2021-11-25

富达匪徒问题是$ k $的武器问题的变体，其中每个臂的奖励通过提供额外收益的富达奖励来增强，这取决于播放器如何对该臂进行“忠诚”在过去。我们提出了两种忠诚的模型。在忠诚点模型中，额外奖励的数量取决于手臂之前播放的次数。在订阅模型中，额外的奖励取决于手臂的连续绘制的当前数量。我们考虑随机和对抗问题。由于单臂策略在随机问题中并不总是最佳，因此对抗性环境中遗憾的概念需要仔细调整。我们介绍了三个可能的遗憾和调查，这可以是偏执的偏执。我们详细介绍了增加，减少和优惠券的特殊情况（玩家在手臂的每辆M $播放后获得额外的奖励）保真奖励。对于不一定享受载体遗憾的模型，我们提供了最糟糕的下限。对于那些展示Sublinear遗憾的模型，我们提供算法并绑定他们的遗憾。

translated by 谷歌翻译

We explore the downstream task performances for graph neural network (GNN) self-supervised learning (SSL) methods trained on subgraphs extracted from relational databases (RDBs). Intuitively, this joint use of SSL and GNNs should allow to leverage more of the available data, which could translate to better results. However, we found that naively porting contrastive SSL techniques can cause ``negative transfer'': linear evaluation on fixed representations from a pretrained model performs worse than on representations from the randomly-initialized model. Based on the conjecture that contrastive SSL conflicts with the message passing layers of the GNN, we propose InfoNode: a contrastive loss aiming to maximize the mutual information between a node's initial- and final-layer representation. The primary empirical results support our conjecture and the effectiveness of InfoNode.

translated by 谷歌翻译

我们认为，在建立和基准机制学习（ML）模型时，研究界应该赞成评估度量，以更好地捕获其模型在实际应用中提供的价值。对于特定的使用情况 - 选择性分类 - 我们表明它不仅可以简单，而且它还具有导入后果，并提供了在“良好”的ML模型中寻找的内容。

translated by 谷歌翻译